Skip to content

Conversation

@MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Aug 1, 2019

What changes were proposed in this pull request?

In the PR, I propose to re-implement DateTimeUtils.truncTimestamp and use Java 8 time API to adjust timestamps to YEAR, MONTH, QUARTER, WEEK, DAY, HOUR, MINUTE and SECOND. This simplifies implementation and will allow to easily support other level of truncation, see https://issues.apache.org/jira/browse/SPARK-28017.

Also I pass ZoneId instances to the function instead of TimeZone instances to avoid unnecessary conversions.

How was this patch tested?

By existing test suites. I am going to re-run the DateTime benchmark.

@SparkQA
Copy link

SparkQA commented Aug 1, 2019

Test build #108526 has finished for PR 25329 at commit a8008bc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks plausible to me.

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 4, 2019

Unfortunately, benchmarks show that this implementation is 2-3 times slower than current one. Most of the time, it spends inside of resolving zone ids to zone offsets. For example, the truncTimestamp method is bounded by the binary search by historical zone infos in DateTimeBenchmark:
Screen Shot 2019-08-04 at 10 43 30
I have changed the benchmark to avoid the search on zone history but truncTimestamp still spends most of the time in zone conversions. For recent year - 2019, it is bound by getting zone info from an internal cache:
Screen Shot 2019-08-04 at 10 52 16

I will close the PR since I don't know how to avoid the zone conversions. Maybe in newer versions of JDK, this will be faster, and we will come back to the changes again.

@dongjoon-hyun
Copy link
Member

Thank you for the benchmark, @MaxGekk . I also didn't expect the slowdown. When you close this PR, could you add a comment about this? (You can point the above comment.) That would be helpful when we revisit this with JDK11.

@MaxGekk MaxGekk changed the title [WIP][SPARK-28596][SQL] Use Java 8 time API in the date_trunc() function [SPARK-28596][SQL] Use Java 8 time API in the date_trunc() function Aug 5, 2019
@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 5, 2019

I linked this to Build and Run Spark on JDK11

@MaxGekk
Copy link
Member Author

MaxGekk commented Aug 5, 2019

I am closing this PR because new implementation based on Java 8 API slows down date/timestamp truncations up to 3 times ( see details #25329 (comment)). SPARK-28596 has been linked to SPARK-24417 to try it on JDK11.

@MaxGekk MaxGekk closed this Aug 5, 2019
@dongjoon-hyun
Copy link
Member

Thank you for linking to JDK11. That will remind us definitely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants